Fast Duplicated Documents Detection using Multi-level Prefix-filter
نویسندگان
چکیده
Duplicate document detection is the problem of finding all document-pairs rapidly whose similarities are equal to or greater than a given threshold. There is a method proposed recently called prefix-filter that finds document-pairs whose similarities never reach the threshold based on the number of uncommon terms (words/characters) in a document-pair and removes them before similarity calculation. However, prefix-filter cannot decrease the number of similarity calculations sufficiently because it leaves many document-pairs whose similarities are less than the threshold. In this paper, we propose multi-level prefix-filter, which reduces the number of similarity calculations more efficiently and maintains the advantage of prefix-filter (no detection loss, no extra parameter) by applying multiple different prefix-filters.
منابع مشابه
Multi-View Face Detection in Open Environments using Gabor Features and Neural Networks
Multi-view face detection in open environments is a challenging task, due to the wide variations in illumination, face appearances and occlusion. In this paper, a robust method for multi-view face detection in open environments, using a combination of Gabor features and neural networks, is presented. Firstly, the effect of changing the Gabor filter parameters (orientation, frequency, standard d...
متن کاملApplying mean shift and motion detection approaches to hand tracking in sign language
Hand gesture recognition is very important to communicate in sign language. In this paper, an effective object tracking and hand gesture recognition method is proposed. This method is combination of two well-known approaches, the mean shift and the motion detection algorithm. The mean shift algorithm can track objects based on the color, then when hand passes the face occlusion happens. Several...
متن کاملImage De-Noising and Micro Crack Detection of Solar Cells
Solar cell is known as a sustainable and environment friendly source of energy in nature. It converts sunlight directly into electricity with zero emission and also without side-effects on the environment. But, solar cells have optical and mechanical defects which include the type of micro crack, the size of crack, and the noise from electrical or electromechanical interference during the image...
متن کاملTime-Varying Frequency Fading Channel Tracking In OFDM-PLNC System, Using Kalman Filter
Physical-layer network coding (PLNC) has the ability to drastically improve the throughput of multi-source wireless communication systems. In this paper, we focus on the problem of channel tracking in a Decode-and-Forward (DF) OFDM PLNC system. We proposed a Kalman Filter-based algorithm for tracking the frequency/time fading channel in this system. Tracking of the channel is performed in the t...
متن کاملA New Fast and Accurate Fault Location and Classification Method on MTDC Microgrids Using Current Injection Technique, Traveling-Waves, Online Wavelet, and Mathematical Morphology
In this paper, a new fast and accurate method for fault detection, location, and classification on multi-terminal DC (MTDC) distribution networks connected to renewable energy and energy storages presented. MTDC networks develop due to some issues such as DC resources and loads expanding, and try to the power quality increasing. It is important to recognize the fault type and location in order ...
متن کامل